Ophthalmology Science
○ Elsevier BV
Preprints posted in the last 7 days, ranked by how well they match Ophthalmology Science's content profile, based on 20 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Bolo, K.; Wong, B.; Do, J.; Ambite, J.-L.; Li, Z.; Kesselman, C.; Daskivich, L.; Xu, B.
Show abstract
Purpose: To evaluate the incidence and baseline predictors of intraocular pressure (IOP)-lowering treatment following detection of referable glaucoma by teleretinal screening. Design: Retrospective cohort study. Methods: Participants were derived from a safety-net teleretinal diabetic retinopathy screening program (2013-2024). Participants included individuals who screened positive for referable glaucoma (cup-to-disc ratio [CDR] [≥]0.6 or CDR asymmetry [≥]0.2) and completed in-office diagnostic evaluation. The primary outcome was initiation of IOP-lowering treatment (medication, laser, or surgery) and the secondary outcome was intervention with surgery. Cumulative incidence functions were estimated, accounting for loss to follow-up. Fine-Gray models were used to identify baseline screening predictors to risk stratify each outcome. Glaucoma diagnosis was approximated using diagnostic codes and chart review. Results: 2,367 participants were included. The cumulative incidence of treatment was 19.6% (95% CI: 18.0-21.2) at Year 1 and 45.1% (42.1-48.1) at Year 8. Early treatment occurred primarily in glaucoma cases, whereas treatment accumulated longitudinally in glaucoma suspects, reaching 36.5% (31.6-41.5) by Year 8. Surgery was less common (8-year incidence: 5.3%). Baseline screening data predicted treatment and surgery, enabling risk stratification. At Year 8, cumulative incidence differed substantially between high- and low-risk groups (treatment: 59.9% vs. 31.2%; surgery: 9.7% vs. 1.0%). Older age (sub-distribution hazard ratio [SHR] 1.03 per year, p<0.001), Black race (SHR 1.50, p<0.001), and personal history of glaucoma (SHR 1.90, p<0.001) were associated with treatment; Asian race was protective (0.71, p=0.03). Older age (SHR 1.06, p<0.001), worse visual acuity (SHR 5.11 per logMAR unit, p<0.001), and screening at a hospital-based site (SHR 2.46, p=0.003) were associated with surgical treatment. Conclusion: Nearly half of safety-net diabetic patients screening positive for referable glaucoma initiated IOP-lowering treatment over 8 years, while few received surgery. Baseline screening characteristics enabled risk stratification of treatment and surgery. These findings address an evidence gap about longitudinal consequences of screening and suggest that its impact extends beyond detection of prevalent glaucoma to include identification of high-risk glaucoma suspects who warrant ongoing surveillance.
Spencer, G. M.; Karim, K.; Dzioba, A.; Graham, M. E.; You, P.; Hummel, T.; Gellrich, J.; Coyle, P.; Burns, H.; Peer, S.; Zawawi, F.; Lechien, J. R.; Schriever, V. A.; Bhargava, E. K.; Whitcroft, K. L.
Show abstract
Background: Olfactory dysfunction (OD) in children remains underdiagnosed and poorly characterised. Despite its known impacts on nutrition, quality of life, safety awareness, and psychosocial development, no standardised diagnostic or management pathway currently exists for paediatric OD. This study aimed to characterise global practice patterns and identify diagnostic and therapeutic challenges unique to paediatric care. Methodology/Principal: A 44-item cross-sectional online survey was distributed to a verified international network of paediatric otolaryngologists across 36 countries via a closed professional platform. The survey assessed five domains: diagnostic practices, management protocols, technology and innovation, education and training, and barriers to effective care. Regional grouping was used to facilitate meaningful statistical comparisons. Categorical variables were evaluated using chi-square tests, with odds ratios and 95% confidence intervals reported for significant findings. Results: Of 351 potential participants, 167 responded (47.6% response rate). Most respondents (83%) reported seeing children with OD, yet 95% saw fewer than ten such patients annually. Psychophysical testing was never performed by 54.8% of respondents, while 88.4% routinely ordered cross-sectional imaging. Testing frequency increased significantly with patient age (Cochran's Q p<0.001). The most common barriers to objective testing were insufficient training (44.3%), time constraints (29.9%), and funding limitations (28.1%). Multidisciplinary collaboration was negligible. Significant regional variation was observed across most practice domains. Conclusions: Paediatric OD care is characterised by functional underinvestigation, fragmented multidisciplinary collaboration, and systemic educational gaps. These findings support urgent development of standardised clinical guidelines, age-appropriate validated assessment tools, and formal interdisciplinary care pathways.
Liu, Y.-S.; Dou, X.-W.; Zheng, P.-Y.; Feng, W.; Ma, L.-J.; You, Y.-N.; Shao, G.-W.; Shen, J.-G.; Yu, X.; Qiao, C.; Cheng, Z.-W.; Li, Z.-W.; Su, F.; Zhang, B.-W.; Qu, X.-H.; Jiang, g.
Show abstract
Background: Treatment decisions for carotid atherosclerotic disease rely primarily on luminal stenosis, although plaque vulnerability and symptomatic status better reflect short-term cerebrovascular risk. A scalable CTA tool for automated phenotyping of symptomatic carotid disease is lacking. Materials & Methods: In this multi-institutional retrospective study, 689 patients (mean age, 67.9 {+/-} 7.7 years; 366 men) from four hospitals were analyzed after screening 705 CTA examinations. 423 patients from one center were used for five-fold development and internal validation, and 266 patients from three centers for independent external validation. CarotidMamba, a deep learning framework combining dual foundation-model encoders with Mamba-based sequence modeling, was developed and benchmarked against clinical, radiomics, clinic-radiomics, CNN, and transformer comparators. Results: In the development cohort, CarotidMamba achieved an AUC of 0.839 (95% CI, 0.799-0.879) and accuracy of 0.825 (95% CI, 0.793-0.857), outperforming the strongest comparator by 0.066 and 0.050, respectively. External validation yielded AUCs of 0.897 (95% CI, 0.835-0.959) in YCH, 0.809 (95% CI, 0.720-0.898) in DCH, and 0.762 (95% CI, 0.649-0.875) in GH-NTC. CarotidMamba showed the lowest Brier score and expected calibration error across cohorts, with calibration slopes near 1.0. Conclusion: CarotidMamba provides an interpretable, clinically oriented, and externally validated CTA framework for phenotyping symptomatic carotid plaques, supporting vulnerability-aware imaging assessment beyond stenosis alone.
Landry, T. C.; Kim, Y.
Show abstract
Background. Capillary refill time is a resuscitation target in septic shock,1-4 but bedside measurement is examiner-dependent. An ICU monitor co-records a photoplethysmogram on the pulse oximeter and intermittent noninvasive blood pressure cuff cycles; if the probe and the cuff share a limb, each cycle is an unplanned vascular occlusion test on the distal microvascular bed. Standard practice places the two on opposite limbs. Objective. To measure how often, in MIMIC-IV-WDB v0.1.0, charted cuff cycles show the photoplethysmographic morphology expected of a same-limb cuff and probe, and to characterize the candidate capillary refill-like signal when that morphology is present. Methods. MIMIC-IV-WDB v0.1.05 was linked to the MIMIC-IV clinical database.6 A pre-registered rule-based detector identified candidate occlusion-reperfusion signatures on the 1-Hz perfusion-index envelope around each charted cuff timestamp. The primary endpoint was the proportion of cuff cycles suitable for analysis that were detector-positive at a 15-second reperfusion threshold, with 95% confidence intervals estimated by resampling patients at a fixed seed. A secondary analysis used a locally hosted multimodal language model (a Gemma-3 derivative on a non-device server) to adjudicate the same signature on perfusion-index plots; no MIMIC-IV-WDB content left the workstation. Results. Of 9,224 charted cuff cycles, 8,909 had a usable pulse-oximeter waveform, and 268 cycles in 15 patients (4.30% of the 6,236 cuff cycles suitable for analysis, 95% CI 2.60 to 6.03) met the primary 15-second threshold. The language model adjudicated the same cycles and called 1,367 of the 8,909 cycles with a usable waveform (15.34%) signature-present, roughly five times the detectors count. Because no laterality ground truth exists, agreement with a single blinded reader served as the comparator rather than accuracy. The two methods were about equally concordant with the reader: precision was 0.25 (95% CI 0.14 to 0.39) for the detector and 0.24 (95% CI 0.10 to 0.35) for the language model, although reweighting to the full population of cycles with a usable waveform lowered the language model to 0.030 (95% CI 0.009 to 0.053). These estimates are reference-limited: a blinded re-read of a 150-card subsample showed only moderate intra-rater reliability (Cohen {kappa} 0.46 to 0.59) with systematic undercalling on the first pass, and rescoring against the corrected re-read roughly doubled precision for both methods. Conclusions. Opportunistic extraction of capillary refill-like signals from archived ICU pulse oximetry is limited in two distinct ways. First, sensor geometry limits how often the signal is recordable: cuff cycles rarely show the morphology expected of a same-limb cuff and probe pair, consistent with opposite-limb placement, so the bottleneck is geometry rather than signal processing. Second, the modest reliability of morphology adjudication limits how well any single flagged cycle can be confirmed: against a blinded reader the detector is a usable screen but a noisy confirmer, the reference is itself only moderately reliable, and the language model is no more concordant despite flagging many more cycles. The minority of cycles in which the morphology appears contain a candidate signal that may merit prospective study under controlled placement with laterality recorded.
Hudson, G. R.; Khan, D. Z.; Fayez, F.; Bhatia, S.; Bano, S.; Costanza, E.; Blandford, A.; Stoyanov, D.; McCulloch, P.; Marcus, H. J.; University College London Collaborators,
Show abstract
Background: Endoscopic endonasal transsphenoidal surgery (EETS) requires navigation around neurocritical anatomy. Today, artificial intelligence clinical decision support systems (AI-CDSSs) can orientate surgeons, but clinician trust in AI remains unclear, limiting safe deployment. This study evaluates how modifiable design affects trust and performance in a real-world pituitary surgery AI-CDSS. Method: Online, 70 clinicians with pituitary surgery experience were randomised evenly to a Basic or Enhanced AI-CDSS which outline the sella on EETS operative video. The Enhanced group additionally received explanation of the model and previous publications, alongside confidence labels depicting outline reliability. Both groups annotated the sella on six video clips, first alone then with the optional AI-CDSS. Clips were ordered by declining AI performance, except for the final clip. Self-reported trust was measured using a 1-7 scale after each annotation, and performance was the DICE overlap between user annotations and the ground truth. Comparisons used Mann-Whitney U and permutation analysis. Results: Sixty-four participants (91%) finished the exercise (31 Basic, 33 Enhanced). When AI performed best, median trust was 5.00 in both arms (U=559, p=.521). However, when AI performed worst, trust was significantly lower for the Enhanced group (3.00 vs 3.67, U=668, p=.035), sustained in the final clip (3.67 vs 4.33 U=687, p=.019). User performance improved with the AI-CDSS, but with no significant difference between the groups on the best or worst AI performing clips. Nevertheless, for the best AI, senior clinicians had higher median performance in the Enhanced group (0.95 vs 0.90, U=75, p=.066). There was also less dispersion in the Enhanced group when AI was inaccurate (IQR: 0.07 vs 0.21, p=.004). Conclusion: Interface design can improve trust calibration in a surgical AI-CDSS and may increment performance in seniors when AI is accurate, and consistency when AI is inaccurate. In future, these features may form important safety checks during translation to the operating room.
Krooss, S. A.; Yang, T.; Yuan, Q.; Drick, N.; Sgodda, M.; Held, J.; Behrendt, P.; Hartleben, B.; Koczulla, R.; Ma, X.; Liu, Y.; Wedemeyer, H.; Janciauskiene, S.; Di Donato, N.; Cantz, T.; Wang, E.; Wu, Y.; Hoeper, M.; Xia, Q.; Ott, M.
Show abstract
Background: Alpha-1 antitrypsin deficiency (AATD) caused by the PI*ZZ mutation (Glu342Lys) results in hepatic accumulation of misfolded AAT-Z protein and reduced circulating AAT levels, leading to progressive liver disease and emphysema. Gene correction therapy represents a potentially curative approach by directly correcting the underlying genetic defect. We report the first case of successful hepatic gene correction with early histological and functional assessment. Methods/Case presentation: We report the case of a 66-year-old male patient with PI*ZZ AATD who underwent gene correction therapy within the YOLT-202 phase I/Ia clinical trial (clinical trial.gov ID NCT07193615). Ten weeks post treatment a liver biopsy was performed to re-evaluate pre-existing F2 liver fibrosis as measured by elastography before entering the study. Serum samples allowed functional assessment of the AAT-mediated elastase inhibition. Results: Liver biopsy did not show signs of hepatic inflammation and demonstrated 54% (Sanger) and 57% (Illumina) gene correction rate of the PI*ZZ variant on the DNA level with no bystander edits or off-target effects. Following a transient elevation of transaminases during the early post-treatment period, liver enzymes normalized. Monthly serum AAT measurements demonstrated biologically active and stable therapeutic levels throughout follow-up. Conclusions: This case demonstrates efficient and precise hepatic gene correction without concerning histological alterations and with substantial improvement of functional parameters, supporting the feasibility and safety of gene editing approaches for AATD.
Maciaszek, J. L.; Pastor Loyola, V.; Cain, T.; Cardenas, M.; Blackburn, P. R.; Wilkinson, M. R.; Koo, S. C.; Wu, C.-H.; Li, C.; Wang, L.; Nichols, K. E.; Klco, J. M.; Eldomery, M. K.
Show abstract
Purpose: Pathogenic or likely pathogenic (P/LP) variants are increasingly identified in genes more commonly associated with adult-onset cancer predisposition, but their prevalence and relevance to children who present with cancer remain unclear. Methods: We retrospectively analyzed 1,280 consecutive pediatric patients with cancer who underwent clinical germline sequencing, using a virtual panel, from 2021 to 2024. Genes with P/LP variants were categorized as aoCPG or pediatric-onset cancer predisposition genes (poCPG) according to cancer risk before age 18 years and pediatric surveillance recommendations. Variant relevance was adjudicated using tumor diagnosis/histopathology, immunohistochemistry, and tumor molecular features and classified as primary, secondary, or indeterminate. Results: Among 1,280 patients, 197 (15.4%) harbored 211 P/LP variants across 54 genes. Sixty-six variants (31.3%) occurred in aoCPG, 87 (41.2%) in poCPG, and 58 (27.5%) were heterozygous variants in autosomal recessive genes. Among adult-onset variants, 7 (10.6%) were primary, 54 (81.8%) secondary, and 5 (7.6%) indeterminate. Among pediatric-onset variants, 77 (88.5%) were primary and 10 (11.5%) secondary. Six patients (3 adult-onset variants; 3 pediatric-onset variants) received targeted therapy informed by germline/somatic sequencing results. Conclusion: In pediatric oncology, most variants in aoCPG are secondary rather than tumor-related findings. Tumor-informed interpretation, beyond variant classification, may improve reporting, counseling, and therapeutic decision-making
Hu, L.; Bass, M.; Patridge, E.; Molusky, M.; Antoine, G.; Vuyisich, M.; Banavar, G.
Show abstract
Background: Chronic diseases and symptom syndromes often develop after prolonged biological changes that may precede formal diagnosis. RNA-based metatranscriptomics captures active microbial and human gene expression and may provide a functional layer for disease risk evaluation. To address this translational gap, we developed and validated a Disease Risk Score (DRS) framework that integrates metatranscriptome-derived pathway activity scores from stool, saliva, and blood samples, and evaluated its potential clinical utility as an adjunct risk-evaluation tool. Methods: DRS uses disease-specific sets of pathway activity scores derived from stool and saliva microbial functions, stool and saliva microbial taxa, and blood human gene expression. For each disease, 'not optimal' pathway scores are aggregated into a normalized cumulative odds ratio, or cOR, using score-level odds ratios, statistical significance, and literature-supported biological relevance derived from a Development Cohort of 22,369 individuals. A cOR [≥] 5 is defined as high risk. Performance is evaluated in an independent Validation Cohort of 15,908 individuals using self-reported diseases as the reference. Disease support requires both significant cOR separation between self-reported and not-reported (Cohen's d [≥] 0.2) and risk ratio enrichment of self-reported disease among individuals classified as high risk (95% CI of Risk Ratio > 1). Results: Of 20 initially evaluated diseases, 15 meet the prespecified validation criteria on the independent validation cohort: ADHD, anxiety, chronic fatigue syndrome, depression, GERD, hypertension, inflammatory bowel disease, IBS-C, IBS-D, insomnia, MASLD, obesity, obstructive sleep apnea, Sjogren's syndrome, and type 2 diabetes. Five selected clinical scenarios illustrate how DRS can support clinician-mediated decision making, including IBS subtype reclassification, improved diagnostic acceptance in IBS-D, personalized lifestyle counseling in MASLD and early type 2 diabetes, and diagnostic uncertainty in atypical GERD. Conclusions: DRS is a metatranscriptomics-based risk-stratification framework that aggregates active microbial and human pathway signals into interpretable disease-specific risk estimates across a wide range of disease conditions. Validation against self-reported disease labels in an independent cohort shows significant risk enrichment for each of 15 diseases. DRS is intended as an adjunct to clinical evaluation: a decision support tool in situations where routine care encounters uncertainty, delay, or low patient engagement. Future prospective studies using clinically adjudicated endpoints are needed to assess calibration and clinical outcomes.
Pongmala, C.; Roytman, S.; van Emde Boas, M.; Vangel, R.; Rosano, C.; Bohnen, N.
Show abstract
Background Slow walking in older adults with mild parkinsonian signs (MPS) is a complex, multifactorial phenomenon arising from the cumulative burden of subclinical age-associated pathologies. This decline reflects age-associated neuronal loss in the dopaminergic system. A recent study suggests that levodopa treatment may enhance gait parameters. The goal of this small pilot study is to explore the effect of levodopa treatment on slow walking gait in older adults with MPS. Method This study was a randomized, placebo-controlled clinical pilot trial. Slow walking older adults without clinical evidence of PD were recruited and randomized into 2 groups (active treatment group or placebo control group). Participants in the active group were pre-treated with carbidopa for three days, followed by carbidopa-levodopa for seven days. Spatiotemporal gait parameters were evaluated at baseline and post-intervention. Results Gait factor analysis identified three main factors explaining gait characteristics at baseline, which included gait efficiency, gait rhythmicity, and gait turning.No effect of treatment was observed in the placebo group (p=0.111, p=0.616), no group difference was observed between the placebo and active group at baseline ({beta}=0.310, p=0.547), but a strong trend for a treatment-related increase was observed in the active treatment group ({beta}=0.506, p=0.076). Conclusion Our preliminary data suggest that sustained levodopa treatment (one week) in conjunction with carbidopa pre-treatment and concomitant carbidopa supplementation is feasible in slow walking older adults with MPS. Moreover, the data indicate potential efficacy, showing improvements in cadence, and step durations.
Landry, T. C.; Kim, Y.
Show abstract
Background. Capillary refill time, an examiner-dependent bedside test of distal microvascular perfusion, has become a resuscitation target in septic shock,1,2,3,4 motivating a continuous surrogate computed from the photoplethysmogram (PPG, the optical waveform the pulse oximeter on every ICU patient already records).5,6,7,8 Objective. We attempted three PPG-derived candidate measures on the MIMIC-IV Waveform Database (MIMIC-IV-WDB v0.1.0) and asked, by inspecting randomly drawn examples, whether each captured its intended physiology before any downstream modeling. Methods. MIMIC-IV-WDB v0.1.09 was linked to MIMIC-IV.10 The signals were a cuff-anchored perfusion-index recovery (reactive hyperemia when the cuff shares an arm with the probe), a slow Mayer-wave-band power ratio of the perfusion index (sympathetic vasomotor tone), and a per-beat diastolic exponential decay time constant (a refill-like recovery time). For each signal we drew 10 random examples at a fixed seed and checked them against a checklist fixed in advance. Each was read by the author and, separately, by MedGemma 1.5, a multimodal medical language model run locally. A synthetic test with a known time constant checked the third signal. Results. The cuff-anchored signal showed the expected occlusion-reperfusion shape on 268 of 6,236 evaluable cuff cycles (4.30%) in 15 of 19 patients, consistent with opposite-limb placement of the probe and cuff. The slow-band ratio returned a stable cohort value, but a clear, stationary peak appeared in only4 of 10 random windows. The per-beat fit met its goodness-of-fit threshold in 10 of 10 beats, yet a cardiac-frequency heuristic flagged a possible fit on the heart-rate oscillation in 7 of 10, and in 5 of 17 patients the time constant lay where an exponential is indistinguishable from a straight line. A 0.5Hz high-pass pre-filter implanted its own approximately 318 ms time constant regardless of truth. The language model tracked the human on clear positives but reported the pattern present on every call it returned, never absent. Conclusions. Two of the three candidate signals did not reflect their intended physiology in most examples, and the third was constrained by sensor placement. Inspecting a few random raw inputs against a checklist written in advance is an inexpensive upstream check before downstream inference on PPG-derived microvascular signals.
Collier, A.
Show abstract
Background Electronic health record documentation patterns may reflect workflow complexity, monitoring intensity, and operational strain in intensive care settings. However, documentation-derived features can be sensitive to local documentation culture, data capture systems, and outcome definitions. Retrospective validation across multiple datasets is therefore needed before these signals are used in workflow intelligence or clinical AI governance tools. Objective To evaluate whether documentation-density and documentation-timing features show reproducible retrospective signal for ICU workflow complexity and long-stay proxy outcomes across de-identified critical care datasets, while distinguishing workflow and long-stay associations from unsupported claims about mortality prediction, burden reduction, or deployment readiness. Methods We synthesized retrospective validation results from de-identified ICU and workflow datasets generated through a prespecified documentation-density validation program. Feature families included Documentation Burden Score style features, Shift-End Documentation Rate style features, documentation reliability style metadata, and all-documentation feature sets where available. Outcomes included long ICU length of stay proxies, mortality where available, and workflow proxy endpoints. Models compared baseline feature sets with enhanced models containing documentation-density or workflow features. Performance was summarized using area under the receiver operating characteristic curve, Brier score where reported, delta AUROC, bootstrap confidence intervals where reported, and label-shuffle controls where available. Results The strongest external long-stay proxy evidence came from the NWICU chartevents analysis, which included 28,612 ICU stays, 20,267 stays with chart events, and 9,619,759 chart events. For ICU length of stay greater than the median, baseline AUROC was 0.5252. Enhanced AUROC was 0.9512 for Documentation Burden Score features, 0.9214 for Shift-End Documentation Rate features, 0.8470 for documentation reliability style features, and 0.9517 for all documentation features. Corresponding label-shuffle enhanced AUROCs were near random, ranging from 0.4897 to 0.5064. For ICU length of stay greater than the 75th percentile, baseline AUROC was 0.5155. Enhanced AUROC was 0.9433 for Documentation Burden Score features, 0.9194 for Shift-End Documentation Rate features, 0.8118 for documentation reliability style features, and 0.9427 for all documentation features, with label-shuffle enhanced AUROCs from 0.4836 to 0.4999. Additional retrospective support was observed in eICU workflow analyses, HiRID first-24-hour documentation-density analyses, MIMIC-IV HF ICU internal analyses, MIMIC-IV-Note metadata extensions, and nursing-chart or lab density proxy analyses. However, cross-institution discrimination transfer was weak without recalibration, and several analyses remained proxy validations rather than final clinical validations. Conclusions Documentation-density and documentation-timing features show promising retrospective signal for ICU workflow complexity and long-stay proxy outcomes, especially in NWICU chartevents and selected internal dataset-specific analyses. These findings support further preregistered, prospective, silent-mode validation of documentation-derived workflow intelligence. They do not establish prospective clinical performance, mortality reduction, clinician burden reduction, autonomous deterioration prediction, or deployment readiness.
Zheng, Y.; Feng, B.; Cheng, R.; Qiu, C.; Long, Z.; Vaziri, K.; Hahn, J.
Show abstract
Accurate assessment of body composition is important to risk stratification and management of metabolic, musculoskeletal, and aging-related diseases, yet reference modalities such as Dual-energy X-ray absorptiometry (DXA) are costly and impractical for frequent monitoring. Commodity 3D body scans offer a low-cost, radiation-free alternative, but extracting meaningful and predictive shape features from scans remains challenging due to nonuniform point density, variable body size and cross-device differences. We introduce BodyMAE, a self-supervised, surface-area aware masked autoencoder for metric-scale 3D body scans. The pipeline integrates area-adjusted sampling, a long-range focused encoder, and a lightweight decoder regularized to promote locally uniform reconstructions. Trained and evaluated on 917 paired 3D body scans paired with clinical DXA reports, BodyMAE achieves strong accuracy on fat percentage (root-mean-square error (RMSE) 3.825 percentage points, R^2 0.908), fat mass (RMSE 3.694 kg, R^2 0.968), and lean mass (RMSE 3.608 kg, R^2 0.901), with competitive performance on bone mineral content (RMSE 0.284 kg, R^2 0.754).We also assess feature stability across pretrained baselines, finding higher retrieval accuracy for our representations (Top-1 90.131%). These results indicate that combining metric-aware sampling, long-range relational encoding, and local geometric regularization enables accurate body composition estimation from 3D body scans, as validated by comparisons to DXA-derived measurements.
Schwoebel, J.; Semenec, I.; Rousseva, J.; Frasch, M. G.; Thorstenson, R.; Bhatt, M.
Show abstract
Large language models embedded in autonomous agents process trusted instructions and untrusted data in one context window, leaving them open to direct and indirect prompt injection. In healthcare this is not hypothetical: a 2025 JAMA Network Open study found commercial medical LLMs followed injected instructions in 94.4% of simulated patient encounters, including life threatening recommendations . Yet the clinically decisive problem we quantify here is different. Most real clinical threats protected health information PHI exfiltration, cross patient access, bulk export, out of scope advice are fluent, legitimate looking requests that carry no attack signal, so even a state of the art injection detector passes them. Existing runtime guardrails trade safety against latency: model based auditors are accurate but add hundreds of milliseconds of Python inference, while lexical filters are fast but blind to obfuscated or semantically disguised payloads. We present QFIRE, an inline, provider agnostic prompt firewall implemented as a single self contained Rust toolchain proxy, CLI, and benchmark harness. QFIRE combines three mechanisms: (i) positive security scope constraints, which restrict a model call to a declared natural language purpose and block out of scope drift even when no overt attack token is present; (ii) an asynchronous detector graph that runs N rules and their detector nodes concurrently, cheapest checks first; and (iii) a de obfuscation pass that decodes Base64 hex ROT13, folds homoglyphs and leetspeak, and strips zero width characters before detection. QFIRE ships 106 versioned firewall rules and a dedicated HIPAA Safe Harbor 18 identifier PHI panel, and runs a local DeBERTa v3 injection classifier via embedded ONNX Runtime. On 1968 public prompt injection and jailbreak prompts QFIREs deterministic hybrid attains F1 0.86, statistically tied with Metas state of the art PromptGuard 2 0.86 and above protectai DeBERTa v3 0.83; lexical baselines lag 0.16 to 0.50. Our central result is on QFIRE HealthBench, a new 2000 prompt healthcare benchmark we build and release with real garak and Microsoft PyRIT payloads. There the same PromptGuard-2 recovers only 0.40 recall DeBERTa v3 0.57, because most clinical threats carry no injection signal; QFIREs combined scope plus PHI chain reaches 0.83 recall F1 0.87 at a calibrated 0.08 false positive rate. Generic injection detection, even state of the art, is therefore necessary but not sufficient for healthcare agents. A bare LLM judge also closes most of this static corpus gap F1 0.90; QFIREs contribution beyond static accuracy is auditable determinism, bounded latency, and adaptive robustness, where the bare judge falls to 34 to 59% recall section 5.5. End to end, placing QFIRE in front of a tool using agent over a mock EHR sandbox cuts the agents harmful action rate from 0.38 to 0.00 at a 0.13 benign utility cost. All code, rules, corpora snapshots, and scripts are released, and every table regenerates from a single make paper target against local models with no paid API keys.
Chen, M.; Li, X.; Yang, K.; Taramasso, M.
Show abstract
**Abstract** **Background:** Transcatheter edge-to-edge repair (TEER) is an established treatment for mitral regurgitation but remains highly dependent on operator experience and complex transesophageal echocardiography (TEE)-guided intraprocedural imaging. Artificial intelligence (AI)-based semantic segmentation may improve procedural reproducibility and intraprocedural guidance; however, no TEER-specific segmentation framework has been reported. **Objectives:** To develop and evaluate AutoClip, a clinician-driven AI-guided TEE semantic segmentation model designed for simultaneous delineation of mitral valve anatomy and in-vivo TEER device components. **Methods:** A retrospective proof-of-concept study was conducted using 987 intraprocedural TEE frames derived from 10 video clips in 3 patients undergoing MitraClip G4 implantation. Seven semantic labels, including mitral leaflets and device components, were manually annotated using ITK-SNAP. Following standardized preprocessing and region-of-interest extraction, an Attention U-Net architecture was trained frame-wise on bicommissural and corresponding X-plane TEE views. Model performance was assessed using mean intersection-over-union (IoU) and Dice coefficient on an independent test set. **Results:** The Attention U-Net demonstrated improved sensitivity to small device structures compared with conventional U-Net architectures. Preliminary training performance achieved a mean IoU of approximately 0.93, while independent test performance reached a mean IoU of 0.46 across foreground classes. Qualitative assessment demonstrated feasible simultaneous segmentation of mitral leaflets, clip arms, grippers, and delivery shaft during TEER procedures. **Conclusions:** AutoClip represents a proof-of-concept TEER-specific TEE semantic segmentation framework initiated through a clinician-oriented workflow without formal computer science expertise. Although preliminary accuracy remains modest due to limited sample size, this study establishes a reproducible pathway for future AI-assisted intraprocedural guidance systems and larger multicenter development efforts in structural heart interventions.
Gobeil, E.; Bourgault, J.; Enault, M.; Cote, V.; Mitchell, P. L.; Ruel, L.-J.; Girard, A. S.; Vohl, M.-C.; Arsenault, B. J.
Show abstract
Metabolic dysfunction-associated steatotic liver disease (MASLD) is rapidly increasing worldwide, yet effective targeted therapies remain limited. To better understand the molecular mechanisms underlying MASLD, we performed an integrated proteogenomic analysis of human liver tissue. Using mass spectrometry, we quantified 2,744 proteins in 504 liver biopsies from the Quebec Obesity Biobank and examined changes across disease stages. To investigate causality, we integrated liver proteomics with RNA sequencing and genome-wide genotyping to map thousands of protein quantitative trait loci (pQTLs) and expression quantitative trait loci (eQTLs). These molecular data were combined with summary statistics from a meta-analysis of genome-wide association studies including 16,532 MASLD cases and 1,240,188 controls. Mendelian randomization and genetic colocalization analyses revealed that most proteins differentially expressed across MASLD stages were not causally implicated in disease risk, whereas several genetically predicted liver proteins showed evidence of causal effects. Among these, higher hepatic levels of the MTARC1 protein were causally associated with MASLD and hepatic fat accumulation. Phenome-wide analyses suggested that MTARC1 inhibition may reduce the risk of cirrhosis, hepatocellular carcinoma, and cholelithiasis while improving lipid profiles. Notably, the causal MTARC1 variant influenced liver protein levels but not gene expression. Genetic analyses also identified ERLIN1 and HSD17B13 as potential therapeutic targets. In contrast, eQTLs and pQTLs at other loci such as GCKR showed opposite effects on MASLD risk. These findings highlight the importance of integrating tissue proteomics with human genetics to distinguish biomarkers from causal drivers and to identify promising therapeutic targets for MASLD.
Cantrell, L.; Karampatsas, K.; Andrews, N.; Beach, S.; Bentley, E.; Berardi, A.; Bijlsma, M. W.; Cagil Kocana, C.; Daniel, O.; French, N.; Hall, T.; Izu, A.; Khalil, A.; Kwatra, G.; Kyohere, M.; Madhi, S. A.; Mboizi, R.; Miselli, F.; Nielsen, M.; Thorn, N.; van de Beek, D.; Walker, K.; Heath, P. T.; Le Doare, K.; Voysey, M.; PREPARE WP3 Study Group,
Show abstract
Vaccines to prevent infant group B streptococcus (GBS) disease are advancing, with licensure likely based on safety and immunologic endpoints rather than clinical efficacy data. This approach requires robust, generalisable serological thresholds of risk reduction (SToRRs). We combined data from six case-control studies in Europe and Africa to define SToRRs for early-onset (EOD) and late-onset (LOD) GBS disease. Across diverse epidemiological and healthcare settings, anti-capsular polysaccharide IgG concentrations were consistently higher in infants who remained disease free than in those who developed disease. Higher antibody concentrations were required to reduce the risk of EOD than LOD, and higher concentrations were required for serotype Ia than for serotype III. This study provides a quantitative framework to support correlates-based evaluation and potential licensure of maternal GBS vaccines.
Beer, S.; Simpkin, A. J.; Eldeeb, S. Y.; Zar, H. J.; Stein, D. J.; Dunn, E. C.; Smith, A. D. A. C.
Show abstract
Background: In prospective cohort studies, where an exposure is collected repeatedly, interest often lies in determining whether the timing of that exposure has a differential effect on a later outcome. The Structured Life Course Modeling Approach (SLCMA), where users select between temporal hypotheses of exposure specified a priori, provides one way to analyse such longitudinal data. However, few studies using SLCMA consider the effect of time-varying covariates (TVC) which may impact associations. Methods: We present a modified version of the SLCMA - called direct and mediated effects (DME)-SLCMA - which corrects for TVC. We first develop the DME-SLCMA method, test it through simulation, and apply it to psychosocial data from the Drakenstein Child Health Study (DCHS, n=336) to investigate relationships between maternal psychopathology, TVC of socioeconomic status, and offspring depressive symptoms. Results: We found that, on average, offspring depressive symptoms score increased by 3.9% (95% CI: 1.0%-6.9%, p = 0.039) for each unit of maternal psychopathology (SRQ) at 48 months whilst adjusting for time-varying socioeconomic status (at 18, 30, 42 and 54 months). Our simulations identified several realistic scenarios where selections ignoring TVC - with TVC mediated exposure effects present - were prone to be incorrect, including our DCHS example. Conclusion: DME-SLCMA is a robust new approach for life course modelling in the presence of time-varying covariates. We recommend adjusting for TVC whenever possible, and, when not possible, our simulation study identified that scenarios where mediated effects are comparable, or greater, in magnitude to direct effects are most prone to confounding.
shao, w.; Ammerman, B.; Jacobucci, R.
Show abstract
Suicidal risk may be encoded in everyday communication patterns but diluted in routine digital interactions. We introduce a method for surfacing this latent signal: training per-person language model agents on individuals' authored text (the on-screen text each participant typed, captured whenever a keyboard was visible in screenshots) and placing those agents in simulated social interactionsa communicative stress test. Using data from 79 adults with recent suicidal ideation, we ne-tuned individual LoRA adapters on Qwen3-8B using each participant's authored text, then placed agents in standardized conversations with probe personas. Agent-generated risk language was associated with EMA-measured suicidal ideation (r= .576, p < .001), with a single neutral small-talk probe performing nearly as well (r= 551). A shue control conrmed the signal is person-specic (r= .071 when adapters were mismatched), and automated descriptions of participants' general smartphone activity produced no signal, conrming specicity to interpersonal communication. A prompt ablation demonstrated partial robustness to removal of disclosure-encouraging language (r = .430). This proof-of-concept demonstrates that simulated social interaction can amplify latent vulnerability signals, bridging digital phenotyping, generative AI, andsuicide theory.
Gonzales, M.; Kang, X.; Adamson, M. M.; Chao, S. Z.; Yoon, B. C.
Show abstract
PURPOSE: Alzheimer disease (AD) is associated with cognitive impairment, brain atrophy, and elevated amyloid-beta and tau. The study aimed to characterize regional atrophy associated with elevated amyloid-beta and tau, as measured by [18F]florbetapir (FBP) and [18F]flortaucipir (FTP) positron emission tomography (PET), respectively, and determine whether combining PET and atrophy data improves the prediction of cognitive impairment. METHODS: Alzheimer Disease Neuroimaging Initiative data (n = 381) were retrospectively analyzed. PET results were correlated with cortical thickness, gray matter (GM) volumes, Mini-Mental State Examination, and Montreal Cognitive Assessment. Linear/logistic regression and area under the curve (AUC) were used to evaluate for significant correlations and compare performances in distinguishing cognitive impairment, respectively. RESULTS: Incremental loss of cortical thickness and GM volume was observed from FBP-/FTP- (n = 205) to single PET-positive (FBP+/FTP-, n = 133; FBP-/FTP+, n = 5) and FBP+/FTP+ (n = 38) groups, particularly in the temporal and parietal lobes. FBP+/FTP+ showed the most severe cortical thickness loss in the entorhinal cortex, temporal lobe GM atrophy, and cognitive impairment. Adding brain atrophy as the third variable resulted in higher odds ratios and improved AUCs for cognitive impairment, with FBP+/FTP+/temporal GM or entorhinal cortical atrophy+ demonstrating the strongest associations with cognitive impairment. CONCLUSION: A multimodal approach combining PET and MRI may help improve the assessment of cognitive impairment in AD.
Ernandez, J.; Xiang, L.; Adler, R.; Hsu, J.; Shah, S. K.; Kim, D.; Gershman, B.; Mossanen, M.; Weissman, J. S.
Show abstract
OBJECTIVE: Bladder cancer (BC) is predominantly a disease of older, comorbid adults, and radical cystectomy (RC), which is the gold standard treatment, carries considerable morbidity. We sought to determine the impact of baseline dementia and frailty on the care trajectory beyond the immediate postoperative period. We hypothesized that frail patients and those with dementia undergoing RC for BC will have poorer care trajectories. METHODS AND MATERIALS: We identified Medicare beneficiaries [≥] 66 years old who underwent RC for BC in 2017 with 12 months of pre- and post-RC enrollment. Frailty and dementia were characterized using validated, claims-based measures. Associations between baseline frailty and dementia with postoperative care trajectory outcomes were determined using Fine-Gray competing risk models. RESULTS: We identified 3,600 beneficiaries of whom 11.6% were frail and 3.4% met criteria for dementia. Patients with dementia were more likely to be frail, comorbid, and not receive standard-of-care neoadjuvant chemotherapy. Frailty was independently associated with [≥] 2 transitions in care level after index discharge from RC and skilled nursing facility (SNF) admissions within 1 year of RC, exposure to intensive post-RC interventions, including dialysis and feeding tube placement, and poorer survival. Dementia remained associated with SNF admissions regardless of frailty level. CONCLUSIONS: Among a contemporary cohort of older adults undergoing RC for BC, preoperative dementia and frailty were independently associated with poorer care trajectory beyond the immediate postoperative period after RC. Our work highlights a role for preoperative geriatric assessment in identifying and optimizing patients at greatest risk.